Skip to content

ck_tile grouped gemm: more padding#574

Open
matthiasdiener wants to merge 10 commits into
devfrom
mdiener/cktile-grouped-gemm-padding
Open

ck_tile grouped gemm: more padding#574
matthiasdiener wants to merge 10 commits into
devfrom
mdiener/cktile-grouped-gemm-padding

Conversation

@matthiasdiener
Copy link
Copy Markdown
Contributor

@matthiasdiener matthiasdiener commented May 5, 2026

Description

Enabling padding always causes a significant (~15%) reduction in speed, so only enable it when necessary.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@matthiasdiener matthiasdiener requested a review from sudhu2k May 5, 2026 00:15
@matthiasdiener matthiasdiener self-assigned this May 5, 2026
@matthiasdiener matthiasdiener requested review from aris134 May 5, 2026 15:36
Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_common.h Outdated
Comment thread tests/pytorch/test_numerics.py Outdated
Comment thread tests/pytorch/test_numerics.py Outdated
@matthiasdiener matthiasdiener added the ci-level 1 CI test level 1 label May 15, 2026
@matthiasdiener matthiasdiener marked this pull request as ready for review May 15, 2026 22:24
@matthiasdiener matthiasdiener requested a review from aris134 May 15, 2026 22:24
@matthiasdiener matthiasdiener requested a review from aris134 May 19, 2026 19:13
Comment thread tests/pytorch/test_numerics.py Outdated
Comment thread tests/pytorch/test_numerics.py Outdated
Comment thread tests/pytorch/test_numerics.py Outdated
Comment thread transformer_engine/common/gemm/ck_grouped_gemm/ck_grouped_gemm_fp16.cpp Outdated
Comment thread tests/pytorch/test_numerics.py Outdated
@matthiasdiener matthiasdiener requested a review from aris134 May 20, 2026 22:06
Copy link
Copy Markdown
Contributor

@aris134 aris134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@matthiasdiener matthiasdiener added ci-level 3 CI test level 3 and removed ci-level 1 CI test level 1 labels May 23, 2026
@aris134
Copy link
Copy Markdown
Contributor

aris134 commented May 27, 2026

Quick follow-up question: are there certain padding cases/shapes where we should prefer fallback due to the performance penalty of the padded path?

@matthiasdiener
Copy link
Copy Markdown
Contributor Author

Quick follow-up question: are there certain padding cases/shapes where we should prefer fallback due to the performance penalty of the padded path?

I looked at this briefly but could not find a config where this would be profitable, at least for bf16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-level 3 CI test level 3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants